Holoviews¶
HoloViews is a high-level plotting library for Python that excels at creating stunning interactive plots with an expressive syntax and minimal effort. HoloViews was initially built on top of the Bokeh plotting library, which provides a rich ecosystem of plotting utilities and glyphs, but it can also be used with other plotting libraries such as matplotlib and plotly. HoloViews seamlessly integrates with data analysis libraries such as Numpy and Pandas so that users can easily generate HoloViews plots within their typical workflow.
Visit the gallery of reference plots http://holoviews.org/reference/index.html
import pandas as pd
import numpy as np
import holoviews as hv
hv.extension('bokeh')
# Load data
df = pd.read_csv("../datasets/global_wheat.csv")
# Check DataFrame to ensure we successfully changed the dates and the placeholder for missing values
df.head()
| year | country | area_has | production_ton | yield_kg | |
|---|---|---|---|---|---|
| 0 | 1961 | Argentina | 4420900 | 5725000 | 1295.0 |
| 1 | 1962 | Argentina | 3744700 | 5700000 | 1522.2 |
| 2 | 1963 | Argentina | 5676000 | 8940000 | 1575.1 |
| 3 | 1964 | Argentina | 6135400 | 11260000 | 1835.3 |
| 4 | 1965 | Argentina | 4601200 | 6079000 | 1321.2 |
First line plot¶
To create a line plot the Holoviews library exposes a function called Curve (note the upper case C). The syntax for a simple plot is: hv.Curve( (x, y) ). Note that we need to pass the x and y variables as a tuple (x,y). The following will not work: hv.Curve(x, y)
Let’s start with a simple example. We will create a new, shorter, Dataframe with data for one country. Short datasets like this will help you to familiarize with Holoviews syntax to create simple plots. Below you can find more advanced charts.
# Select all years for the United States
idx_usa = df['country'] == 'USA'
df_usa = df.loc[idx_usa,:]
# With this new dataframe, plot the wheat production in metric tons
hv.Curve(df_usa, 'year', 'production_ton')
The figure we just generated already looks great and is interactive. Much of this is thanks to the underlying Bokeh library. We could have used the Bokeh library directly, but simple plots like this one in Holoviews basically are already doing that.
You probably noticed that the figure does not seem to have the proper aspect ratio for a time series, and that the axes labels contain the default column names in the Pandas Dataframe. In oterh words, all the plot settings are set to default values. We can change different chart attributes by passing additional options.
In Holoviews, options live in a different space, separate from that of the varaibles or dimensions that we used to create the plot. Another nice feature is that most attribute settings are keywords. Even colors are words. So if you want something that is red, then simply write color='red'
We will also include in our next chart the key dimensions kdims and the value dimensions vdims. The syntax remains the same as in the previous plot, but we are exposing some arguments that will help us to create plots in the future and that will make this plot more in line to the official documentation.
# Specify additional options
hv.Curve(df_usa, kdims='year', vdims='production_ton').opts(width=400,
height=300,
title="Wheat Production",
xlabel='Year',
ylabel='Metric Tons',
bgcolor='white',
color='red')
kdims vs vdims¶
Why not using x and y?
In short, kdims represent the set of independent variables and vdims represent the set of dependent variables.
To some extent they represent the same: kdims = x and vdims = y, but Holoviews semantics are meant to be more abstract and expressive. For most line, scatter, and bar charts you can assume that kdims = x and vdims = y. For other plots like heatmaps then the key dimensions would be the x and the y and the value dimension would be whatever magnitude we want to place in each cell of the heatmap.
From now one we will include the keyword arguments
kdimsandvdimsin most plotting lines.
For instance, to create a heat map of wheat grain yield for each year and country in the dataframe we would need to pass the following arguments to the hv.HeatMap function: kdims=['year','country'] and vdims=['yield_kg']. From the HoloViews standpoint there is no x or y, but rather variables that we want to use as keys and varaibles that we want to use as values. This paradigm makes creating figures with HoloViews more expressive. Visit the heatmap below to see it in action.
When passing data using the traditional x,y plotting syntax it is important to pass the x and y variables within a tuple. For insntance the following code will work:
hv.Curve( (df_usa.year, df_usa.production_ton) )
but the following code in which the x and y variables are not within a tuple will throw an error.
hv.Curve( df_usa.year, df_usa.production_ton )
`kdims argument expects a Dimension or list of dimensions, specified as tuples, strings, dictionaries or Dimension instances, not a ndarray type. Ensure you passed the data as the first argument.`
Note that the error message from HoloViews is actually easy to understand and provides useful information on how to solve the error.
# This code will work
hv.Curve( (df_usa.year, df_usa.production_ton) ).opts(width=400)
Note
As described in the previous few examples, HoloViews provides different options for plotting tabular data. The same syntax can be applied to Numpy arrays.
Holoviews Datasets¶
Converting a Pandas Dataframe to a HoloViews Dataset exposes a new machinery for selecting, renaming, and aggregating data in a format that is ready to use with Holoviews.
Note
As we did in our first line plot, it is not necessary to work with HoloViews Datasets. Remember that HoloViews integrates seamlessly with Pandas and we can easily do all the data slicing and selection with Pandas.
# Convert to HoloView dataset to enable more high level features
ds = hv.Dataset(df, kdims=['year'], vdims=['country','area_has','production_ton','yield_kg'])
The new dataset has some interesting functions. For instance we can use the select and aggregate methods to further narrow the data that we intend to display and to easily synthesize variables.
# Use the Holoviews Dataset to slice the data for the USA
hv.Curve(ds.select(country='USA'), 'year', 'production_ton').opts(width=400)
If the addition of kdims and vdims did not convince you that this library can be more expressive, then let’s explore the way in which we can create overlays of different traces, subplots, and dashboards.
Overlays¶
The * operator easily enables the overlay of multiple traces. Make sure the plots are within a tuple to add options to the final figure.
# One plot with two lines
hv_yield_usa = hv.Curve(ds.select(country='USA'), 'year', 'yield_kg', label='USA')
hv_yield_china = hv.Curve(ds.select(country='China'), 'year', 'yield_kg', label='China')
hv_yield_argentina = hv.Scatter(ds.select(country='Argentina'), 'year', 'yield_kg', label='Argentina')
hv_yield_australia = hv.Scatter(ds.select(country='Australia'),'year','yield_kg', label='Australia')
(hv_yield_usa * hv_yield_china * hv_yield_argentina * hv_yield_australia).opts(
xlabel='Year',
ylabel='Yield (kg/year)',
width=600,
min_height=300,
responsive=True,
legend_position='right')
Layout¶
Use the + operator to concatenate plots
This is another convenient trick of the HoloViews library. The + operator is a powerful and succint syntax to create grids. In this case we can specify figure properties to each subplot in the opts tuple. An added bonus is that the zooming and panning occur simulataneously in all the subplots for direct comparison of the data.
# Grid with 4 subplots
# First let's place all the subplots into a variable
grid_yield = hv_yield_usa + hv_yield_china + hv_yield_argentina + hv_yield_australia
# Then we need to tell HoloViews the layout for the subplots
grid_yield.cols(2)
Interactive App¶
We will use the .to() method to map the variables of a HoloViews Dataset into an interactive app. In literally one line of code we can explore our data in a complete different way compared to trasitional plotting libraries.
The groupby argument will create a dropdown menu for categorial variables (e.g. countries) and a slider for numeric variables (e.g. year).
# A simple app to explore data interactively
curves_app = ds.to(hv.Curve, kdims=['year'], vdims=['production_ton'], groupby='country').opts(width=500)
curves_app
Which country has had the most impressive increment in wheat production from 1961 to 2011? Perhaps the following interactive plot will help you to answer this question.
# A simlar application grouping by year.
bars_app = ds.to(hv.Bars, kdims=['country'], vdims=['production_ton'], groupby='year').opts(width=500,
xrotation=45)
bars_app
Bars charts¶
We can also select specific attributes in our data and then pass it into a plotting function. For instance, int he following example we first slice the data for the countries and years that we want using ds.select(), and then we pass the selected data using the .to() method to generate the chart of grouped bars.
countries = ['Argentina','Canada','USA']
years = [1961,1971,1981,1991,2001,2011]
bars = ds.select(country=countries, year=years).to(hv.Bars, ['year', 'country'], 'yield_kg').opts(
width=500, height=400, tools=['hover'], xrotation=90, show_legend=False)
bars
# Contribution of each country to the annual production
ds.to.bars(['year','country'], 'production_ton', []).opts(height=400,width=800,
stacked=True,legend_position='right',
xrotation=90, ylabel='Production (Metric Tons)',
title='Wheat Production')
Errorbars and Spread area charts¶
We can also examine the average world production of wheat and the variability for each year across countries using the error bars plot.
ds_yield = hv.Dataset(df, kdims=['year'], vdims=['yield_kg'])
agg = ds_yield.aggregate('year', np.mean, np.std)
(hv.Curve(agg) * hv.ErrorBars(agg)).opts(height=300, width=600)
# A similar plot can be generated to represent continuous bounded lines
(hv.Curve(agg) * hv.Spread(agg)).opts(height=300, width=600)
Boxplot¶
We can also explore the central tendency and dispersion of wheat yields along five decades per country. Expanded boxes show a larger varaiability caused by both environmental factors, better management, and technology. Assuming that countries like Australia, Canada, Argentina, and the US have access to state-of-the-art agronomic knowledge and farming technology, the compact boxes likely indicate that national average wheat yields have been close tothe maximum attainable yield for the past several decades.
hv.BoxWhisker(ds,'country','yield_kg').opts(width=500, height=400, xrotation=45, ylabel='Yield (kg/ha)')
# Plot cultivated area by country
hv.BoxWhisker(ds,'country','area_has').opts(width=500, height=400, xrotation=45, ylabel='Area (Hectares)')
Heatmap¶
We can also explore the increase in grain yield in kg per hectare for all years and all countries using a heatmap.
hv.HeatMap(ds, kdims=['year','country'], vdims=['yield_kg']).opts(
height=300,width=800,colorbar=True,colorbar_opts=dict(title='Yield (kg/ha)'),toolbar='above')
In case you are not sure about the arguments of heatmaps (or any other plotting function) you can always access a remarkably detailed documentation by typing hv.help(hv.HeatMap).
Gapminder¶
We can also generate an application similar to that of the popular Gapminder by Hans Roslin but for global wheat production.
from holoviews import dim, opts
import numpy as np
hv.output(widget_location='bottom') # This will re-position the slider for better visualization
popts = opts.Points(alpha=0.6,
legend_position='right',
height=400,
width=700,
show_grid=True,
color='country',
cmap='Set1',
line_color='black',
xlabel='Production (Metric Tons)',
ylabel='Grain Yield (kg/ha)',
size=np.sqrt(dim('area_has')/10e3))
hvapp = ds.to(hv.Points, ['production_ton','yield_kg'], groupby='year').opts(popts)
hvapp
The interactive visualization reveals whether the increase in wheat production is a consequence of the increase in yield (Y axis), or a consequence of more cultivated area (bubble size), or a combination of both.
Note
Because we had to adapt the size of the marker to the cultivated area using np.sqrt, changes over time in this variable may not be as evident as changes in grain yield.
Here is the same example using the Scatter function rather than the Points
ds.to(hv.Scatter, kdims=['production_ton'], vdims=['yield_kg','area_has','country'], groupby='year').opts(
alpha=0.6,
color='country',
size=np.sqrt(dim('area_has')/10e3),
legend_position='left',
height=400,
width=700,
show_grid=True,
cmap='Set1',
line_color='black',
xlabel='Production (Metric Tons)}',
ylabel='Grain Yield (kg/ha)')
A typical pitfall in HoloViews consist of trying to use a variable that was not declared in the kdims or vdims. For instance, the following example in which we want to assign the size and color of the markers in the scatter plot based on area_has and country will not work since neither variable was included in vdims. The reason for this mistake is that we assume that the data is available for the plot since these columns are already included in ds.
ds.to(hv.Scatter, kdims=['production_ton'], vdims=['yield_kg'],groupby='year').opts( size=np.sqrt(dim('area_has')/10e3))
Histograms and Distributions¶
# Histogram for a single country applying a boolean index similar to what we would do in Pandas
idx_argentina = ds['country']=='Argentina'
hv.Distribution(ds.iloc[idx_argentina], kdims='yield_kg').opts(xlabel='Yield (kg/ha)')
Because of the expressive and convenient syntax we can easily select data and create subplots using a for loop.
# Initialize empty array (does not need to be a Numpy array)
subplots = []
# Iterate over each country in the table (selected using the unique function)
# Note that we store plot options in each iteration
for country in np.unique(ds['country']):
idx = ds['country']==country
plot = hv.Distribution(ds.iloc[idx], kdims='yield_kg').opts(title=country,xlabel='Yield (kg/ha)')
subplots.append(plot)
# Generate grid with all plots.
hv.Layout(subplots).cols(2)
# Violin plots
violin = hv.Violin(ds, 'country', 'yield_kg').opts(height=300, width=400)
violin
Images¶
We can also import and siplay RGB images. Images need to be Numpy arrays.
Use hv.RGB for images with red, green, and blue channels (x,y,r,g,b)
Use hv.Image to plot 2D arrays (x,y,z). Typical examples are gray images or a matrix with false color intensity.
It is recommended that you use the option aspect='equal' to prevent stretching of the image.
RGB = hv.RGB.load_image('../datasets/geometric_shapes.png').opts(aspect='equal')
RGB
channels = (
hv.Image(RGB[:,:,'R'],label="Red").opts(colorbar=True,aspect='equal',cmap='gray',frame_width=200) +
hv.Image(RGB[:,:,'G'],label="Green").opts(colorbar=True,aspect='equal',cmap='gray',frame_width=200) +
hv.Image(RGB[:,:,'B'],label="Blue").opts(colorbar=True,aspect='equal',cmap='gray',frame_width=200)
)
channels.cols(1)